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(54) An image processing method and apparatus. 

(57) A real time output containing data relating to 
states of facia] parts is generated. A facial area 
detection unit (11) has monitoring (51-57) and 
determining (59-61) processing circuits operat- 
ing in a pipelining manner to determine position 
of the facial area The monitoring circuits 
(51-57) monitor pixel value frequency using 3D 
histogram and backprojection processing. The 
generating factal area signal has masks applied 
by a unit (12) which supplies data to mouth area 
and eye area detection units (14,15). Each of 
these operate on similar principles to the facial 
area detection unit (11). 
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The invention relates to image processing, and 
more particularly to processing of images of facial ex- 
pressions. 

European Patent Specification No. EP 
474,307A2 (Philips) describes a system having 
means for receiving facial image data and processing 
means for processing this data. In more detail, this 
specification describes a method of tracking an ob- 
ject such as a face. An initial template of the face is 
formed and a mask outlining the face is extracted. 
The mask is then divided into a number of sub-tem- 
plates which are not associated with specific features 
of the subject Each successive frame is searched to 
determine matches. While this method appears to be 
effective for tracking an overall object such as a face, 
it does not provide the necessary data relating to 
state (location, orientation, open/closed status etc.) 
of parts of the face and other parts such as fingertips. 
Accordingly, it would appear to have limited applic- 
ability not be suitable to such tasks as automatic gen- 
eration of sign language data and generation displays 
to assist in understanding and communicating sign 
language. 

An object of the invention is to provide for proc- 
essing of facial images in real time to identify states 
of facial parts. Achievement of this objective leads to 
many useful applications such as sign language rec- 
ognition, automatic cartoon animation, or improved 
human-machine interfacing for handicapped people. 
Another object is to provide for processing of hand im- 
ages in conjunction with facial images for assistance 
in sign language understanding and communication, 
for example. 

In this specification, the phrase "sign language" 
relates to not only what is conventionally referred to 
as sign language for deaf people but also to recogni- 
tion of mouth movements to assist in lip reading. The 
term "recognition" is used in a general manner refer- 
ring to a broad range of activities including full recog- 
nition of sign language to screen display of sign lan- 
guage movements to assist in recognition by the user. 
Further, the term "colour" is intended to cover all col- 
ours including white, Mack and grey. 

The invention is characterised in that- 
said receiving means comprises means for re- 
ceiving a colour facial image data stream in digital for- 
mat; 

said processing means comprises:- 

a plurality of image data processing cir- 
cuits interconnected for processing the image data in 
a pipelining manner to provide a real time output, the 
circuits comprising :- 

means for monitoring colour val- 
ues in the image data to identify image data pixels 
representing facial parts, and 

means for determining states of 
the fecial parts by monitoring positional coordinates 
of the identified pixels; ami the apparatus further 



comprises:- 

an output device for outputting the de- 
termined facial part state data in real time. 

Preferably, the processing circuits comprise 
5 means for monitoring frequency of occurrence of pixel 
values in the image data 

Preferably, the monitoring means comprises 
means for carrying out colour histogram matching to 
monitor frequency of occurrence of pixel values in the 
10 image data. 

In one embodiment, the monitoring means com- 
prises means for carrying out three-dimensional col- 
our histogram matching to monitor frequency of oc- 
currence of pixel values in the image data. 
15 Preferably the monitoring means further compris- 
es a back projection means for comparing a generated 
histogram with a template histogram generated off-li- 
ne. 

In another embodiment, the determining means 
20 further comprises a counter for determining the area 
of the facial part or parts. 

Ideally, the apparatus further comprises means 
for normalising the received colour facial image data 
stream. 

25 Preferably, the processing circuits comprise :- 

a facial area detection unit comprising circuits 
for carrying out facial area detection operations on 
the image data; and 

a facia! part detection unit connected to said fa- 
30 cial area detection unit and comprising processing cir- 
cuits for determining states of a facial part within the 
detected facial area using the image data and an out- 
put signal from said facial area detection unit 

In another embodiment, the apparatus further 
35 comprises a mask generating unit connected be- 
tween the facial area and facial part detection units. 

In one embodiment, the apparatus comprises 
two facial part detection units, namely :- 

a mouth area detection unit; and 
40 an eye area detection unit 

Ideally, the apparatus further comprises a mask 
generating unit connected between the facial area 
detection unit and the facial part detection units, said 
mask generating unit unit comprising means for gen- 
45 eratirrg an eyes mask signal and a mouth mask signal. 
In this latter embodiment, the mouth area and eye 
area detection units are connected in the apparatus 
to operate on the same image data in parallel. 

Ideally, the apparatus further comprises means 
so for carrying out a consistency check to validate deter- 
mined positional data. 

Preferably, each processing unit comprises 
means for carrying out a consistency check on posi- 
tional data which it generates. 
55 Preferably, the consistency check means com- 
prises a processor programmed to compare the posi- 
tional data with reference positional data. 

In another embodiment, the monitoring means 



3 



EP 0 654 749 A2 



4 



comprises means for carrying out binary erosion and 
binary dilation steps to generate image data in which 
noise is reduced and the subject area is recovered. 

In a further embodiment, the apparatus further 
comprises image data processing circuits for deter- 5 
mining location of a coloured fingertip represented in 
the input image data. 

In one embodiment, the apparatus comprises im- 
age processing circuits for facial area and separate 
image processing circuits for coloured fingertips, said 10 
circuits being connected to process the input image 
data in parallel. 

Preferably, the apparatus further comprises a re- 
sult processor comprising means for receiving facial 
part and coloured fingertip positional data to generate 15 
an output signal representing proximity of a coloured 
fingertip to facial parts for assistance in communica- 
tion of sign language. 

The apparatus may comprise a video device for 
generating the image data stream, which may be a 20 
camera having an analog to digital convertor. 

In another aspect, the invention provides a meth- 
od of processing facial image data, the method com- 
prising the steps of: 

receiving a digital colour image data stream; 25 
monitoring colour values in the image data to 
identify image data pixels representing facial parts; 

determining states of the facial parts by moni- 
toring positional coordinates of the identified pixels, 
said monitoring and determining steps being carried 30 
out by processing circuits interconnected for process- 
ing the image data in a pipelining manner; and 

outputting in real time a signal representing 
the determined facial part state data. 

In this aspect, the monitoring and determining 35 
steps may be carried out to initially identify fecial area 
and position, and subsequently to identify fecial parts 
and determine states of the fecial parts. 

Preferably, the monitoring step comprises the 
sub-steps of monitoring frequency of occurrence of 40 
pixel values in the image data. 

In another embodiment, the monitoring step com- 
prising the sub-steps of carrying out colour histogram 
matching to monitor frequency of occurrence of pixel 
values in the image data. 45 

Ideally, the monitoring step comprises the sub- 
steps of carrying out three-dimensional colour histo- 
gram matching to monitor frequency of occurrence of 
pixel values in the image data. 

In another embodiment, the monitoring step com- so 
prises the sub-steps of carrying out three- 
dimensional colour histogram matching in which there 
is backprojection with comparison of a generated 
three-dimensional colour histogram with a template 
histogram generated off-line. 55 

The invention will be more clearly understood 
from the following description of some embodiments 
thereof, given by way of example only with reference 



to the accompanying drawings, in which :- 

Figs. 1(a), 1(b) and 1(c) are outline views of an 
image processing system of the inventbn; 
Fig. 2(a) is an overview flow chart of a method 
carried out by the system and Fig. 2(b) is a table 
showing processing parameters in the flow chart; 
Figs. 3(a), (b) and (c) are diagrams illustrating 
normalisation of a captured RGB image; 
Fig. 4 is a flow diagram showing fecial area de- 
tection steps and processing parameters; 
Figs. 5 to g inclusive are block diagrams showing 
a facial area detection unit of the system; 
Figs. 10 to 21 inclusive are diagrams showing in 
more detail the manner in which the steps of Fig. 
2(a) for fecial area detection are carried out; 
Fig. 22 is a flow diagram showing coloured finger- 
tip detection steps; 

Figs. 23(a) and 23(b) are block diagrams of a unit 
for coloured fingertip and mouth area detection; 
Figs. 24 and 25 are diagrams showing some col- 
oured fingertip detection steps in detail; 
Fig. 26 and Figs. 27(a) and (b) are block diagrams 
showing a unit for generation of facial part masks; 
Figs. 28 and 29 are diagrams showing fecial part 
mask detection and input image masking steps in 
detail; 

Fig. 30 is a flow diagram showing mouth area de- 
tection steps; 

Figs. 31(a) and (b) are flow diagrams showing 
eye area detection steps; 
Figs. 32(a), (b), (c) and 33 are block diagrams 
showing a unit for eye area detection; 
Figs. 34 to 37 are diagrams showing eye area de- 
tection steps in more detail; 
Fig. 38 is a schematic diagram showing a com- 
munications support system of the invention to 
help deaf people to lip read, and Fig. 39 is a flow 
diagram showing operation of this system; and 
Fig. 40 is a schematic diagram showing a ma- 
chine interface for lip reading and Fig. 41 is a flow 
diagram showing operation of this system. 
Referring to the drawings, and initially to Figs. 
1 (a), 1 (b) and 1 (c), there is shown an image process- 
ing system 1 of the invention. The system 1 comprises 
a video camera 2 connected to an image processing 
device 3 controlling a video screen 4. The device 3 is 
connected to a workstation 5 having processing cir- 
cuits to assist in decision-making in the system 1 . The 
primary purpose of the system 1 is to monitor move- 
ment of fecial parts and to provide information on the 
proximity of a person's fingertips to fecial parts. 
These are important aspects of sign language. A sam- 
ple screen display 6 is shown in Fig. 1(a). Fig. 1(b) 
shows an effective arrangement for the camera 2 
whereby it is mounted on a frame 7 next to a light 
source 8 at a lower level than the subject's face and 
is directed upwardly in the direction of the line A. The 
ensures that the face is always in the field of view. 
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even when the subject's head has been lowered. 

The system 1 operates to identify facial parts and 
fingertips and determines states for them according 
to their locations, orientation, open/shut status etc. 
This data is generated in real time and the system 1 s 
is therefore useful for such applications as :- 

(a) sign language recognition, where the position- 
al relationship of the fingertip to facial feature 
helps to determine the sign gesture, 

(b) automatic cartoon animation, where cartoon 10 
characters can be generated in real-time using 

the results of the identification to give realistic fa- 
cial gestures, and 

(c) human-machine interfacing, where the facial 

part movement such as that of the eyes can be 15 
used to control a screen cursor, and where the 
movement of the mouth could be used to act as 
a switch, for use in applications where the user 
only has control of facial part movements. 
Referring now to Fig. 1(c) in particular, the gen- 20 
era) construction of the system 1 is illustrated. The 
camera 2 is connected to the device 3 at a normali- 
sation unit 10 which is for elimination of noise caused 
by the ambient illumination conditions. An RGB im- 
age is outputted by the normalisation circuit 10 to a 25 
sub-system comprising units 11 and 12 for detection 
of facial area and for generation of facial part masks. 
The normalisation unit 10 also delivers RGB image in- 
put signals to a fingertip detection unit 13. 

The facial part mask unit 12 is connected to pro- 30 
vide masking data to both a mouth area detection unit 
14 and an eye area detection unit 15. The units 13, 
14, and 15 are all connected at their outputs to the 
workstation 5 which automatically interprets sign lan- 
guage sig nal content of the received data. The output 35 
signal may be used as desired, according to the par- 
ticular application. 

Before describing construction and operation of 
the system 1 in detail, important functional aspects 
are now briefly described. The input is a stream of dig- 40 
ital colour (in this case RGB) image data and this is 
processed by the circuits in the various units in a pi- 
pelining manner i.e. successive operations are car- 
ried out on one or more data streams, individual op- 
erations being relatively simple. Processing in this 45 
manner helps to achieve real time processing. Ac- 
cordingly, the output data may be used very effective- 
ly for a number of applications, some of which are out- 
lined above. 

Another important aspect is that the major units 50 
in the device 3 each include circuits which fall into two 
main categories, namely monitoring and determining 
circuits. The monitoring circuits monitor the colour 
values of the image data stream (and more particu- 
larly, pixel value frequencies) to identify subjects, 55 
while the determining circuits use this data to deter- 
mine position and area data for the subjects. For the 
unit 11 the subject is the facial area generally, for the 



unit 13a coloured fingertip, for the unit 14 the mouth 
area, and for the unit 15 the eyes. In other embodi- 
ments, there is only facial area and mouth area detec- 
tion, however the underlying technical features re- 
main the same. The arrangement of units providing 
general followed by specific subject image data proc- 
essing also helps to reduce complexity and achieve 
real time processing. Another important feature is 
provision of mask data between the general and spe- 
cific subsequent processing, in this embodiment gen- 
erated by the unit 12. This improves efficiency of sub- 
sequent processing. 

Instead of a camera, the primary image signal 
may be provided by any suitable video device such as 
a video tape player. The digital image data may alter- 
natively received from an external source, which may 
be remote. 

Referring now to Figs. 2-37 operation of the sys- 
tem 1 is now described in detail. The overall method 
of operation is illustrated in flow chart format in Fig. 
2(a) and 2(b). 

In step 20 of the method, the next frame image 
is captured by writing an analog pixel stream in the 
camera 2 between synchronisation pulses to an im- 
age memory via an A/D converter within the camera 
2. This provides the initial digital input RGB image sig- 
nal. 

In step 21, normalisation takes place. This in- 
volves inputting the captured RGB image to a large 
look-up table in SRAM as illustrated in Fig. 3(a) and 
generating a normalised RGB signal image by trans- 
lation of the RGB value of each pixel according to the 
formulae of Fig. 3(b) or Fig. 3(c). Trie look-up table is 
pre-coded with coefficients for translation of the RGB 
values. The coefficients of the lookup table are cal- 
culated using either of two functions given in Figs. 
3(b) and 3(c). The 24 bit RGB image data is made of 
up of three 8 bit values, one representing R, one rep- 
resenting G, and one representing B. These three val- 
ues are combined for one 24 bit address into the look- 
up table. Using the formulae, it is possible to calculate 
new 8 bit values representing Normalised R, Normal- 
ised G, and Normalised B. These three 8 bit data val- 
ues are combined to form a single 24 bit data signal 
which is outputted as the Normalised RGB image. 
The coefficients are preceded into the table and 
down-loaded into the coefficients during initialisation 
to the SRAM 10. Alternatively, they may be pre-coded 
by programming the coefficients into a PROM at time 
of manufacture. 

Where it is desired to use less memory a more 
complex circuit having a number of smaller look-up ta- 
bles having discrete logic could be used, whereby 
processing is broken down into several steps. This ar- 
rangement requires less memory but provides a lower 
resolution. 

In step 22 the facial area of the captured image 
is detected by the unit 11. Step 22 is illustrated in 
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more detail in Fig. 4 in which the steps 22(a) to 22(n) 
are shown and processing parameters are indicated. 
The facial area detection unit 11 is shown in more de- 
tail in Fig. 5. The unit 11 comprises a 3D histogram 
and backprojection circuit 51 connected to a smooth- 
ing #1 circuit 52. The former circuit is shown in detail 
in Fig. 6, the latter in Fig. 7(a). 

Operation of the circuit 51 can be divided into 
three phases of operation, namely X, Y and Z. The op- 
eration is as follows. 

Phase X 

The input image is routed via a multiplexor (MUX) 
to a memory (MEM) which is used to store an Image 
Colour Histogram. The pixel values are used as an ad- 
dress to lookup a histogram value which is loaded in 
a COUNTER. The counter is then incremented to in- 
dicate that another pixel whose value is within this his- 
togram box has been encountered, and then this val- 
ue is re-written back into the Image Colour Histogram 
memory. This process is repeated until all pixels in 
the input image have been processed. A histogram of 
the pixel values now resides in the Image Colour His- 
togram memory. It should be noted that this memory 
must be initialized to zero at the start of each frame. 

Phase Y 

In this phase an Internal Addr Gen is used to pro- 
vide a memory address to the Image Colour Histo- 
gram memory and a Model Colour Histogram mem- 
ory. The Model Colour Histogram is loaded with a his- 
togram of the model colour off-line, using a worksta- 
tion to down-load the values into memory. The multi- 
plexor is set so that the Internal Addr Gen impinges 
upon the Image Colour Histogram. The Internal Addr 
Gen generates address for all of the histogram boxes 
and simultaneously the value for the Image Colour 
Histogram and the Model Colour Histogram are out- 
put to the Divider Unit The Divider Unit, which can be 
either an ALU or a LUT, divides these two numbers 
and stores the result in a Ratio Colour Histogram 
memory. The address for the Ratio Colour Histogram 
memory is also supplied by the Internal Addr Gen via 
another multiplexor (MUX). The process continues 
until all boxes in the histogram have been processed. 

Phase Z 

In this phase the address to the Ratio Colour His- 
togram memory is supplied by the pixel values in the 
original input image, using the multiplexor. The input 
image is used to lookup the ratio histogram values 
and output these values to the next processing stage. 

A histogram circuit 54 (shown in detail in Rg. 7(b)) 
and a threshold circuit 53 (shown in detail in Fig. 7(c)) 
are connected to a binary erosion circuit 55 (see Fig. 



8(a)). The unit 11 further comprises a binary dilation 
circuit 56 similar to the binary erosion circuit 55, and 
a binary dilation circuit * 57 (see Fig. 8(b)) There is 
also a projection X-Y circuit 59 (see Fig. 9). Finally, 

5 the unit 11 comprises a counter 60 and an ALU 61. 
The signal flows between these circuits are illustrat- 
ed in Fig. 5, in which the binary erosion circuit is 
shown twice for conformity with the signal flows. 
The normalised input image signal has R, G and 

w B components, the position in the data stream repre- 
senting the x, y position. The RGB image signal is in- 
putted to the circuit 51 and the RGB data is separated 
as shown in Fig. 10 into its R, G and B components. 
Using the component values, a histogram of colour 

15 space is derived by incrementing the particular box 
the RGB values point too. The function of the 3D his- 
togram is to indicate the relative frequency that one 
part of the 3D space occupied in relation to another 
part of the same 3D space. In this example the 3D 

20 space represents RGB colour space, with the value 
(0,0,0) indicating black, and (255,255,255) indicating 
white. If the input image was only white, and the Buck- 
et size is set to 1, then all boxes except (0,0,0) will be 
zero. The function of the variable Bucket Size is to al- 

25 low a group of similar colours to be represented by a 
single box. This reduces the number of boxes in the 
3D histogram and therefore reduces the complexity of 
the implementation. 

After all pixels in the input image are processed, 

30 the box in the histogram which has the highest value 
represents the colour (or colours, if the box in the his- 
togram covers more than a single, discrete colour) 
which occur most frequently in the input image. The 
second highest value represents the second most 

35 frequent colour, and so on. In backprojection as 
shown in Rg. 11 the resulting histogram is then com- 
pared with a template histogram of the facial part 
which is to be identified to provide an output BP signal 
which indicates the degree of possibility that the pixel 

40 is in the facial area. A 3D colour histogram of the 
mouth template is calculated off-line. The resulting 
histogram represents the frequency of colours within 
the template image. Those boxes which have colours 
which are highly representative of the mouth have 

45 high values, whilst those boxes which contain no rep- 
resentative contain zero. By dividing the template his- 
togram value by the resulting histogram value in 
those boxes where the template histogram value is 
greater than the resulting histogram value, and by div- 

50 iding the resulting histogram value by the template 
histogram value in those boxes where the template 
histogram value is less than the resulting histogram 
value, a new ratio histogram can be formed which in- 
dicates which colours in the input image are likely to 

55 belong to the mouth. Finally by using the input RGB 
image as a look-up address into this ratio histogram, 
an output image can be formed which highlights 
those pixels which are most likely to belong to the 
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mouth. 

The next step (also carried out by the circuit 51) 
is reduction of the image size by sub-sampling the 
pixel stream to reduce complexity of processing as 
shown in detail in Fig. 12. This step involves the circuit s 
51 carrying out sum and average operations on the 
input image. 

The circuit 52 then carries out smoothing opera- 
tions as shown in Fig. 1 3 to reduce noise and increase 
the reliability of the correct threshold function. This in- 10 
verves convolution in which there is multiplication by 
weights and summation. In more detail, the function 
of the circuit 52 is to obtain an average value over a 
group of pixels, whose central pixel is used as the ref- 
erence pixel. In the simplest case, all pixels are sum- 15 
med and then divided by the number of pixels. This is 
equivalent to multiplying all pixels by a coefficient of 
1 and then summing. However, there are a multitude 
of smoothing algorithms where the unitary coeffi- 
cients can be replaced with weight coefficients so 20 
that pixels closest to the reference pixel have greater 
influence on the result than those further away. Fig- 
ure 13 shows some typical examples. 

Operation of the circuit 52 can be explained in the 
following way. The input image enters the circuit 25 
shown in Fig. 7(a) at the beginning of the 256 bit shift 
register. If the correlation surface isNxM, then there 
are M 256 bit shift registers and N delay elements in 
each row. The function of each delay element is to 
simply store a pixel value for 1 clock cycle and then 30 
pass it to the next day elements. Each output from the 
delay elements is multiplied by a Constant, which are 
loaded into circuit 52 off-line using the workstation. 
The results from all of these multiplications are then 
summed before being outputted. 35 

The circuits 54 and 53 (shown in Fig. 7(b) and 
7(c)) then generate a histogram as shown in Fig. 14 
to obtain the threshold, the output being a set of val- 
ues of frequency against pixel values as shown in Fig. 
15. There is then thresholding in the area (Xs,Ys), 40 
(Xe.Ye) to generate a binary BP image as shown in 
Fig. 16. The sub-script "s" represents start, "e" repre- 
senting end. As Fig. 14 shows, the function of the cir- 
cuit 54 is to build a histogram of the pixel values, with 
the most commonly occurring pixel value having the 45 
highest frequency. An implementation is shown in 
Fig. 7(b) where the pixel value is used as an address 
to an SRAM to look-up the number of times the pixel 
value has occurred. This value is then loaded into a 
counter where ft is incremented and then written back so 
into the SRAM. After all pixels have been inputted into 
the circuit 54 and histogram values calculated, the 
histogram values are read into an ALU (Arithmetic 
Logic Unit) in the circuit 54 where a threshold value 
is calculated. In the circuit 54, the ALU is used to 55 
search for the maximum value by reading every his- 
togram value in the SRAM. When the maximum value 
has been found it is then multiplied by a constant, usu- 



ally less than 1 , to produce a threshold value which is 
outputted to the next processing stage. 

The circuit 53, shown in Fig. 7 (c) takes as input 
the pixel stream and using a comparator compares 
each pixel against the threshold calculated in the cir- 
cuit 54. If the pixel value is greater than or equal to the 
threshold the comparator outputs a 1, else 0. In this 
fashion, a binary image is formed. 

The circuit 55 then carries out binary erosion as 
shown in Fig. 17 to eliminate random noise from the 
image data. This is followed by binary dilation as 
shown in Figs. 18(a) and 18(b) to recover the area 
size. These steps are followed by a further binary ero- 
sion step as shown diagrammatical I y in Fig. 5. 

The circuit 59 then carries out projection opera- 
tions to generate projection X and projection Y values 
for the facial area as shown in Fig. 19 to obtain the 
position of the final area edges. A projection search 
is then carried out by the ALU 61 to check represen- 
tation of the pixels after binary erosion to generate lo- 
cation coordinates (Xsf.Ysf) (Xef.Yef) the sub-scripts 
indicating the coordinates of a box in which the face 
is located in the image. 

Finally, area counting by the counter 60 as shown 
in detail in Fig. 21 takes place to generate a value for 
the area of the face. This signal is outputted to the unit 
1Z The data which is outputted includes the area 
where the face is to be found in the image. 

Referring again to the overall flow chart of Fig. 2, 
a consistency check 23 is carried out by the ALU 60 
whereby the position and area values generated in 
the facial area detection step 22 are monitored for 
consistency and an accept or reject signal is generat- 
ed. The criteria used is the area of the identified box. 
If the area of the box is zero or very small (of the order 
of only several pixels) this indicates that no face was 
found in the image. In this event, there is no need to 
proceed with any additional processing with this 
frame and therefore the next frame can be process- 
ed. 

In step 24 the detected area is displayed and this 
appears in a square box on the screen 4. 

In step 25 there is coloured fingertip detection, 
the sub-steps 25 (a) to 25 (m) being illustrated in de- 
tail in Fig. 22. The input is the normalised RGB image 
and the output is a set of position and area values, as 
with the facial area detection step 22. 

The unit 13 to implement these steps is shown in 
Fig. 23(a) in which parts similar to those described 
with reference to Fig. 5 are identified by the same ref- 
erence numerals. The unit 13 is similar to the mouth 
area detection unit 14, and indeed the relevant inputs 
are both indicated, the separate RGB image inputs 
being indicated by interrupted lines. The unit 13 has 
an additional smoothing #2 circuit 62, illustrated in 
Fig. 23(a) and which operates as shown in Fig. 24. 
There is an additional input to the histogram circuit 54 
from an ALU 63 to restrict the area which the histo- 
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gram is made from. Smoothing #2 and the additional 
input to the histogram circuit 54 may not be required 
if quality of the data input is high. 

The smoothing shown in Fig. 24 involves averag- 
ing an area of pixels around a central reference pixel. 5 
All smoothing coefficients are unitary, which 
amounts to all pixels within the specified area being 
summed and then divided by the number of pixels. 
The purpose of this step is to help identify a single lo- 
cation where the fingertip is situated. If smoothing 10 
was not performed, then it is possible that there would 
be several pixels which have the maximum value, de- 
termined as shown in Fig. 25. It is then difficult to de- 
cide which pixel is the true position of the fingertip. By 
smoothing, an average of the pixels in a given area 15 
can be obtained. The position when the fingertip is lo- 
cated will have many pixels in close proximity to one 
another with high values, whereas areas where the 
fingertip is not located may have a few high value pix- 
els sparcely separated. After averaging with the 20 
smoothing filter, the area of fingertip is enhanced and 
it is easier to locate to true position of the fingertip. 

A consistency check is carried out by the ALU 61 
in step 26 for acceptance or rejection of the generated 
position and area data for the coloured fingertips. The 25 
detected area is displayed in step 27. A tip area check 
is carried out in step 28 to determine if the area of the 
fingertip is within a set tolerance At+At-. If the tip area 
check is positive, a flag TFLAGis set to 1 in step 29, 
otherwise it is set toO. 30 

The unit 12 then operates to detect a facial part 
mask in step 30. The unit 12 is shown in more detail 
in Figs. 26, 27(a) and 27(b) and the manner in which 
this step is carried out is illustrated in Fig. 28. This 
step involves generation of position masks for the 35 
mouth and for the eyes using the detected facial area 
data. Samples of the generated masks are shown at 
the bottom of Fig. 28. Step 31 involves masking the 
input image using the normalised RGB image, the 
mouth mask and the eyes mask as shown in Fig. 29. 40 
These masks restrict the area to search for the mouth 
and eyes to increase the recognition rate. The mask 
data is fed into the units 14 and 15. 

The hardware for the mask generation is shown 
in Fig. 27 (a). The input into the component 65 is the 45 
Skin Area Box, (Xsf, Ysf, Xef, Yef). Essentially the 
Mouth Area and Eye Area are found by splitting this 
box into two equal regions so that the Eye Area is spe- 
cified by the box (Xsf, Ysf, Xef, (Ysf + Yef)/2), and the 
Mouth Area is specified by the box (Xsf, (Ysf + Yef)/2, 50 
Xef, Yef). 

Fig. 27 (b) shows the next stage of processing. 
The inputs to the component 66 are either the Mouth 
Area or the Eye Area parameters, which are stored in 
X1 , X2, Y1 and Y2. The combination of counters, com- 55 
parators and adder produce a pixel address stream 
which defines the box area in the context of the orig- 
inal image. By using this pixel address as the address 



to the original image, memory, it is possible to only 
process that part of the image which is defined as be- 
ing either Eye Area or Mouth Area, as these pixels 
which are read from the image memory and passed 
to other image processing tasks, this technique is ad- 
vantageous since only the pixels which are within the 
defined areas are processed by the post image proc- 
essing hardware and not the whole image, and hence 
this is faster. Also shown is a multiplexor (MUX) and 
the Image Pixel Stream which is used to load the orig- 
inal image into the memory at each frame. 

Position and area data for the mouth area is gen- 
erated by the unit 14 in step 32 as shown in Fig. 30. 
The following sub-steps are involved : - 

(a) histogram 

(b) backprojection 

(c) smoothing #2 

(d) max. value search 

(e) smoothing #1 

(f) histogram 

(g) threshold value search 

(h) threshold of image 

(i) binary erosion 
(j) binary dilation 
(k) projection X & Y 

(I) projection search, and 
(m) area counting. 

The unit 14 which carries out these tasks is 
shown in Fig. 23(a) and (b) and the sub-steps, being 
previously described forfacial and fingertip detection, 
will be readily understood from the above description. 
Of course, a difference is that a masked image is 
used. An output signal representing mouth area and 
position is generated. 

In step 33, there is eye area detection by the unit 
15 and reference is made to Figs. 31 to 37. The sub- 
steps involved are outlined in Figs. 31(a) and 31(b) 
and the unit 1 5 is shown in detail in Figs. 32(a), 32(b), 
32(c) and 33. The unit 15 comprises an AND gate 70 
for reception of the eyes mask and RGB image data 
A transformation circuit 71 is connected to the gate 
70, and is in turn connected to a multi-template 
matching circuit 72. The circuit 71 is an SRAM look- 
up table. The multi-template matching circuit 72 com- 
prises a set of matching circuits 75 connected to a 
comparator 76, in turn connected to a smoothing #2 
circuit 77 similar to that previously described. Further, 
there are first and second ALU's 78 and 79 connected 
to a set constant circuit 80. A template matching circuit 
75 is shown in detail Fig. 33. 

In operation, the main inputs are the eye mask 
data from the unit 12 and the normalised input image 
from the unit 10, the purpose being to determine the 
eye positions. After the signals are processed by an 
AND gate, an SRAM 71 averages and divides by 
three to give a grey scale value as shown in Fig. 34. 
The multi-template matching circuit 72 determines 
the best match for each eye as shown in Figs. 35 to 
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37, a comparator selecting the closest match. The im- 
age is smoothed to determine peak intensities for the 
eyes by two ALU's. A constant is set to ensure that 
here is no repetition in determining peaks. The circuit 
75 for multi-template matching is an 8 x 8 correlator 5 
as illustrated in Fig. 33. In this circuit the sub-script 
e1 represents the first eye, e2 the second eye. 

In step 34 a consistency check is carried out to 
ensure that the general locations of the detected 
areas are consistent with the general face layout In 10 
step 35 the detected area is displayed. 

A decision is then made by the workstation 5 to 
determine the nearest facial part to the fingertip in 
step 37, provided the flag TFLAG has been set to one 
as indicated by the decision step 36. The result is then 15 
d splayed in step 38 and this involves display of the 
relevant part in a different colour within a square box. 
The process ends in step 39. 

Another aspect to the invention is shown in Figs 
38 and 39 whereby the system acts as a commu nica- 20 
tions support system comprising a TV camera 100, an 
image processing device 101 and a workstation 102 
with a video screen 103. The image processing de- 
vice 101 and the workstation 102 carry out the proc- 
ess steps 110 to 120 shown in Fig. 39. Basically, this 25 
system involves use of a sub-set of the circuits shown 
in the relevant previous drawings. The steps include 
capture of the next frame image in step 110 followed 
by normalisation in step 111 and facial area detection 
in step 112. Instep 113 a consistency check is carried 30 
out and the system proceeds to step 114 involving dis- 
play of the detected area if the consistency check out- 
put is positive. In step 115 there is detection of the fe- 
cial part mask and the input image is masked in step 
116 for mouth area detection 117. After a consistency 35 
check 118, the detected areas displayed in step 119 
and the method ends in step 120. 

It will be appreciated that the communication sup- 
port system shown in Fig. 38 allows a TV camera to 
pick up the facial image of a non-deaf person and pro- 40 
vides for the generation of a fecial image to detect the 
location of the mouth and its area. This information is 
transmitted to the workstation 1 02 which displays the 
mouth image after the image is magnified. This 
greatly facilitates lip reading by a deaf person. 45 

A still further aspect of the invention is now de- 
scribed with reference to Figs. 40 ami 41. A system 
130 comprises a TV camera 131, an image process- 
ing device 132, a workstation 133 and a video screen 
134. The purpose of the system 130 is to provide for 50 
the inputting of machine commands by silent lip 
movement The TV camera 131 picks up the user's fe- 
cial image and the device 132 processes the image 
to detect the location of the mouth and its area in real 
time. Using this information, the device recognises 55 
the word (command) and this is transmitted to the 
workstation 1 33. The workstation 1 33 is controlled by 
this command to generate a display as shown on the 



video screen 134. Operation of the device 132, is de- 
scribed in Fig. 41 and it involves capture of the next 
frame image in step 140, normalisation, 141 and fe- 
cial area detection 142. There is a consistency check 
143 and the detected area is displayed in step 144. 
Step 145 involves detection of the facial part mask 
and the input image is masked in step 146 for mouth 
area detection 147. After a consistency check 148, 
the detected area is displayed in step 149 and in step 
150 the positional area inputs are matched with tem- 
plates for generation of a command. 



Claims 

1. An image processing apparatus (1) comprising 
means for receiving facial image data, and proc- 
essing means for processing the facial image 
data, characterised in that, 

said receiving means (10) comprises 
means for receiving a colour facial image data 
stream in digital format; 

said processing means (3) comprises:- 

a plurality of image data processing cir- 
cuits (11-15) interconnected for processing the 
image data in a pipelining manner to provide a 
real time output, the circuits comprising :- 

means (51-57) for monitoring col- 
our values in the image data to identify image 
data pixels representing facial parts, and 

means (59-61) for determining 
states of the facial parts by monitoring positional 
coordinates of the identified pixels; and the appa- 
ratus (1) further com prises: - 

an output device (6) for outputting the de- 
termined fecial part state data in real time. 

2. An apparatus as claimed in claim 1 wherein the 
monitoring means (51-57) comprises means for 
monitoring frequency of occurrence of pixel val- 
ues in the image data. 

3. An apparatus as claimed in claim 1 wherein the 
monitoring means (51-54) comprises means for 
carrying out colour histogram matching to moni- 
tor frequency of occurrence of pixel values in the 
image data. 

4. An apparatus as claimed in claim 2 wherein the 
monitoring means (51) comprises means for car- 
rying out three-dimensional colour histogram 
matching to monitor frequency of occurrence of 
pixel values in the image data. 

5. An apparatus as claimed in claim 3 wherein the 
monitoring means further comprises a backpro- 
jection means (51) for comparing a generated 
histogram with a template histogram generated 
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off-line. 

6. An apparatus as claimed in claim 1 wherein the 
determining means further comprises a counter 
(60) for determining the area of the facial part or 5 
parts. 

7. An apparatus as claimed in claim 1 further com- 
prising means (10) for normalising the received 
colour facial image data stream. 10 

8. An apparatus as claimed in claim 1 wherein the 
processing circuits comprise :- 

a facial area detection unit (1 1 ) comprising 
circuits for carrying out facial area detection op- 
erations on the image data; and 

a facial part detection unit (14, 15) con- 
nected to said facial area detection unit and com- 
prising processing circuits for determining states 
of a fecial part within the detected facial area us- 
ing the image data and an output signal from said 
facial area detection unit 

9. An apparatus as claimed in claim 8 further com- 
prising a mask generating unit (12) connected be- 25 
tween the fecial area and facial part detection 
units. 



(61) programmed to compare the positional data 
with reference positional data. 

16. An apparatus as claimed in claim 1 wherein the 
monitoring means comprises means (55, 56, 57) 
for carrying out binary erosion and binary dilation 
steps to generate image data in which noise is re- 
duced and the subject area is recovered. 

17. An apparatus as claimed in claim 1 further com- 
prising image data processing circuits (1 3) for de- 
termining location of a coloured fingertip repre- 
sented in the input image data. 



19. An apparatus as claimed in claim 17 further com- 
prising a result processor (5) comprising means 
for receiving facial part and coloured fingertip 
positional data to generate an output signal rep- 
resenting proximity of a coloured fingertip to fa- 
cial parts for assistance in communication of sign 
language. 



15 18. An apparatus as claimed in claim 1 comprising 
image processing circuits (11-15) for facial area 
and separate image processing circuits (13) for 
coloured fingertips, said circuits being connected 
to process the input image data in parallel. 



10. An apparatus as claimed in claim 8 comprising 

two facial part detection units, namely :- 30 
a mouth area detection unit (14); and 
an eye area detection unit (15). 

11. An apparatus as claimed in claim 10 wherein the 
apparatus further comprises a mask generating 
unit (12) connected between the fecial area de- 
tection unit (11) and the fecial part detection 
units, (14, 15) said mask generating unit unit (12) 
comprising means for generating an eyes mask 
signal and a mouth mask signal. 

12. An apparatus as claimed in claim 10 wherein the 
mouth area and eye area detection units (14, 15) 
are connected in the apparatus to operate on the 
same image data in parallel. 45 

13. An apparatus as claimed in claim 1 , wherein the 
apparatus further comprises means (61) for car- 
rying out a consistency check to validate deter- 
mined positional data. so 

14. An apparatus as claimed in claims 8 to 13 where- 
in each processing unit comprises means (61) for 
carrying out a consistency check on positional 
data which it generates. 55 

15. An apparatus as claimed in claim 13 wherein the 
consistency check means comprises a processor 



20. An apparatus as claimed in claim 1 further com- 
prising a video device (2) for generating the im- 
age data stream. 



23. A method of processing facial image data, the 
method comprising the steps of: 

receiving (20) a digital colour image data 
stream; 

monitoring (22) colour values in the image 
data to identify image data pixels representing 
facial parts; 

determining (22) states of the fecial parts 
by monitoring positional coordinates of the iden- 
tified pixels, said monitoring and determining 
steps being carried out by processing circuits in- 
terconnected for processing the image data in a 
pipelining manner, and 

outputting (38) in real time a signal repre- 
senting the determined facial part state data. 



21. An apparatus as claimed in claim 20, wherein the 
video device is a camera (2) having an analog to 

35 digital converter. 

22. An apparatus as claimed in claim 20 wherein the 
video device is a camera (2), and the camera is 
mounted on a frame (7) adjacent a light source (8) 

40 so that it is directed upwardly at an angle so that 
the face of a person seated next to the frame is 
within the field of view. 
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24. A method as claimed in claim 23, wherein the 
monitoring and determining steps are carried out 
to initially identify facial area (22) and position, 
and subsequently to identify facial parts and de- 
termine states of the facial parts. 5 

25. A method as claimed in claim 23 wherein the 
monitoring step comprises the sub-steps (22(a)- 
22(g)) of monitoring frequency of occurrence of 
pixel values in the image data. w 

26. A method as claimed in claim 23 wherein the 
monitoring step comprising the sub-steps of car- 
rying out colour histogram matching (22(a)) to 
monitor frequency of occurrence of pixel values 15 
in the image data. 

27. A method as claimed in claim 23 wherein the 
monitoring step comprises the sub-steps of car- 
rying out three-dimensional colour histogram 20 
matching to monitor frequency of occurrence of 
pixel values in the image data. 

28. A method as claimed in claim 23 wherein the 
monitoring step comprises the sub-steps of car- 25 
rying out three-dimensional colour histogram 
matching in which there is backprojection (22(b)) 

with comparison of a generated three- 
dimensional colour histogram with a template his- 
togram generated off-line. 30 
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Fig. 5(a) 



l. Input: Input RGB image 



2. Output: (Normalised) RGB image 



3. Function: Translation of RGB value of each pixel according to the foOowmg formula. 

New Rfu j)= 255* R(i. j)2 / (R(i. j*2+ GO. j)2+ B(L j)2> 
New Gfi. j)= 255* G(i, j)2 / (R(u j»+ GO. j)2+ BO. j») 
New BO. j>=255* B(i. j)2 / (Rfi, j)2+ Gfu j)2+ B(i, j)2) 



1. Input: Input RGB image 


2. Output: (Normalised) RGB image 


3. Function: Translation of RGB value of each pixel acca 


idmx to the folio* ax*: formula. 


New R(i. j)= 255* R(i. j) / (R(i. jH GO. j>+ B(i. j)) 
New GO. j)= 255* GCu j) / (R(i. jH GO. jK BO, j)) 
New BO. j)= 255* BO. j) / (Rfu GO. iK B(u j)) 
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FUNCTION 



2Z(a) 



r=R/Buckct size. g=G/Buckci size, b=B/Buckct size 
hist(r, g, b)++ 




PARAMETERS 



Bucketsi 



Fig. 10 
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Backproiecdon 



INPUT image 




Step 1. Addressing according to x 

r=R/Buckel size. g=G/Bucxct size, b=B/Buckei size 



(r.g.b)=(0.0. 0)thcnRx=0 
lx=0 then Rx=0 
Mnx=0 then Rx=0 



3D histogram 



A 




3 





3D his togram( tempi ) 



3D histogram(temp2) 



3D histogram (terap3) 



3D huZDgnm(tcnipn) 




Step 2. Corresponding histogram value lx of input 



Step 3. Corresponding histogram value M 1 x of Template 1 

lfMlx>=Ix then Rlx=lx/ Mix else R1x=Mlx/lx 



Step 4. Corresponding histogram value M2x of Template 2 

IfM2x>=Ix then R2x=lx / M2x else R2x=M2x/!x 



Step 5. Corresponding histogram value M3x of Template 3 

lfM3x>=Ix ihenR3x=lx/M3x else R3x=M3x/Ix 



Step 6. Corresponding histogram value Mnx of Template n 

If Mnx>=Ix then Rnx=lx / Mnx else Rnx=Mnx/U\ 



Stcp7. Rx= Scale* MaxlMlx.M2j.M3x „Mnx| 

/ 

Step 8 Send out the Rx 
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Fig. 
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else P(x t y)=Consil 



Mx 



Kernel 



My 



.(x.y) 



Mx, My = 3, 5. 7, 9 



OUTPUT image 




PARAMETERS 

r 



Const 1 , Consi2, Mx, My 



Fi S J7 
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Binary Dilation* 




PARAMETERS 



Consil. Const2, Mx, My 



30 



Fig.8(a) 
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Binary Dilation 



INPUT image 



Kerne! 



FUNCTION 




22&) 



/ 



If p(x. y)=Consi2 then P(x. y)=Consi2 

If (p(x, y>=Omsil ) AND (All pixels in Kernel ai (x, y) =Constl ) then P(x, y>=Omsil 
else P(x, y)=Cansi2 



Mx 



Kernel My 



-(x. y) 



Mx, My =3, 5,7,9 



OUTPUT image 




P ARAMETERS 

^ Consil , Consil. Mx. My ^ 



Fig ®(b) 
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Projection X&Y 
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Projection Search 



INPUT 



Proj2(y) 



FUNCTION 



Proj2(y) 



AfY)= Z Ptoj2(y) (Summation from Ymin to Y) 
Max level I 




A( Y> Z Proj2(y) (Summation from Ymax to Y') 



Projl(x) 



Ymin Y Y* Y max 

A(X)= Z Projl(x) (Summation from Xmin to X) 



Max level 




A(X> I Projlu) (Summation from Xmax toX*) 



Xmin 



Xmax 



OUTPUT 



The output Xs is the first X which satisfied the condition: A(X) > Max level* N 
The output Xc is the first X* which satisfied the condition: A(X*) > Max level* N 

The output Ys is the First Y which satisfied the condition: A(Y) > M axle veil* N 
The output Yc is the first Y* which satisfied the condition: A(Y') > Maxlcvell* N 



PARAMETERS 



Xmin, Xmax 
Tmin, Ymax 
N 



F&20 
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Area counting 




PARAMETERS 



(Xs.Ys) (XcYc) 
Canstl 



34 



EP 0 654 749 A2 



Coloured fingertip detection 



Step 


Inpui 


Output 


Parameter (unfixed) 


3D Histogram 


RGB image 


3D histogram 


Bucket size 


Back projection 


RGB image 

3D histogram 

3D histogram(tcmpl) 

3D hisiogram(tcmp2) 

3D histogram(temp3) 

3D histogram(tempn) 


BPimage 


Bucket size 
Number of templates 
Scale 


Smoothing 


BPimage 


Smoothed 

R Pirn a op 


Filler size: 5 


Max value search 


Smoothed BPimage 


Max -value 
XY coordinate 


(Xs, Ys)(Xe. Ye): 
(0,0X255.255) 


Smoothing 1 


Smoothed BPimage 


SBPimagc 


Gauss filler sizc:5 
(sigma=l) 


Histogram 


SB P image 


BPhistogram 


(Xs, YsKXe,Ye)<--XY 
coordinate, width & height 


Threshold value 
search 


BPhistngram 


TH 


fTH 

Pm:255, Search DirecUon:BjK*word 


Threshold of 
image 


SBPimagc 


BiBPimage 


Cortstl:l, Const2:0, TH1:TH, 
TH2:255 

(Xs, YsXXe, Ye>c- XY coordinate, 
width & height 


Binary Erosion 

(Repeal) 


BiBPimage 


ErBi BPimage 


Const IK), Const2:l. Mx:3, My:3 
Mn of erosion -^OR/TTHVI? 0 


Binary Dilation* 

(Repeat) 


ErBiBPimagc 
BiBPimage 


DiErBiBPimage 


Constl:0, Consul, Mx:3, My:3 
No. of dilation =No. of erosion* 1 .6 


Projection X&Y 


DiErBiBPimage 


projectionX 
projection Y 


(Xs, YsXXe, Yc)<-- XY coordinate, 
width & height 


Projection Search 


projectionX 
projection Y 


Location coordinate 
(Xst, Yst) (Xet, Yet) 


Xmin, Xmax, Yrnin. Ymax<— 
XY coordinate, width & height 
N=0.1 


Area counting 


DiErBiBPimage 


At: Area of face 


(Xs, YsXXe, Yc):(Xst. Yst) 

(Xei. Yet) 

Constl:! 



/ 

25 
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Convolution (Smoothing 4fc2 



INPUT image 



FUNCTION 




25(c) 
/ 



Multiplcd by weights and summation 

P(x+Mx/2. y+My/2)=IiZj M(i, j) * p(x+i, y+j) / Zi£j M(i, j) 



0 Mx-1 Smoolhing2: Mx=My=3 

J U \ 111 

My-i \ iii 

1 I 1 

Smooihing2: Mx-My-5 

\ 



Smooihing2: Mx=My=9 



OUTPUT image 



y P(X, Y) j 



PARAMETERS 



Filler size: 3, 5, 9 
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Max value search 



25(d) 



/ 



INPUT image 




C 



(xm, ym)={(x, y)l Max{p(x. y) for xs<=x<=xe, ys<=y<-yc) ) 




) 



PARAMETERS 



(Xs,Ys) (XcYe) 



%.2S 
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XsF 



Y5F Xef Xef 



Y. 



Adder 
+ 



X 2 



Divide X 2 



-r2 



X 2 



Y2 j Sbrage 



Y> 



,ttoia jg.|gig isttrs 



Yi 



T 



Ey.Area ^ ^ Mouth Are, 

Multiplexery'' 



Counter 



Comparator 



Counter 



Yi 



Comparator 



Fg.27<b) 1 



Adder 










Image 








memory 



Multiplier 
1 — 



Made 
data 



Image 

pixel 

Stream 



Const. 
= scran size 
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Detection of facial part mask 



3o 



Step 


Input 


Output 


Gene, of Position Mask of Mouih 


Position: (Xsf. Ysf)(Xcf. Ycf) 


MMaskimage 

: (Xsmm, Ysmm) (Xcmm, Yemm) 


Gene, of Position Mask of Eyes 


Position: (Xsf. Ysf) (Xef. Ycf) 


EMaskimagc 

: fXsme. Ysme)OCemc. Yeme) 


Gene, of MouthMask 
(AND Operation) 


BFimagc 
MMaskimage 

: (Xsmm, Ysmm) (Xemm, Ycmm) 


MouthMask 


Gene, of EycMask 
(AND Operation) 


BFimagc 
EMaskimagc 

: (Xsmc, Ysmc) (Xcmc, Yemc) 


EyesMask 



(Xsf, Ysf) 



BFimage 



(Xcf, Ycf) 



EMaskimagc 

: (Xsme, Ysme) (Xcmc, Yeme) 
MMaskimage 

: (Xsmm, Ysmm) (Xemm, Yemm) 



MouthMask 



EyesMask 



3/ 



MMaskimage AND BFimage EMaskimagc AND BFimage 



Masking of the input image 



Step 


Input 


Output 


Mask the input image 
(AND Operation to bit plains) 


(Norm) RGB image 
MouthMask 


MRGB image 




AND 



MouthMask 



(Norm) RGB image 
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Mouth area detection 



Step 


Input 


Output 


Parameter (unfixed) 


Histogram 


MRGB image 


3D histogram 


Bucket size 


Backprojcciion 


MRCB image 
3D histogram 
3D histogram(templ) 
3D hi$togram(tcmp2) 
3D histogram(icmp3) 
3D histogram(iempn) 


BPimage 


Bucket size 
Number of templates 
Scale 


Smoothing2 


BPimage 


Smoothed 

R Pi m <to^ 
Dm llJIttXv 


Biter size: 9 

\— 3VarHgC SIZE/ 


Max value search 


Smoothed BPimage 


Max -value 
XY coordinate 


(Xs, YsXXc, Yc): 


Smoothing I 


Smoothed BPimage 


SBPimage 


Gauss filter size 
(sigma=l) 


Histogram 


SB Pirn age 


BPhisiogram 


(Xs. Ys)(Xe, YeK- XY 
coordinate, width & height 


Threshold value 
search 


BPhistognun 


TO 


fTH 

nn.ijj, oearcn ivirccijon.uacjLWCTiu 


Threshold of 
image 


SBPimage 


(Partial ) 
BiBPimage 


Consil:! , ConstiO, TH1:TH, 
TH2:255 

(Xs. YsXXe, YeK- XY coordinate, 

wtflh&lwieht 


Binary Erosion 

(Report) 


BiB Pirn age 


ErBiBPimage 


Const 1:0. ConsliK Mx:3. My:3 
No. of erosion =SQR(fTH)/12.0 


Binary Dilation* 

(Report) 


ErBiBPimage 
Bi BPimage 


DiErBiBPimagc 


Constl:0, Const2:l. Mx:3, My:3 
No. of dilation =No. of erosion* 1.6 


Projection X&Y 


DiErBiBPimagc 


projcclionX 
projecikmY 


(Xs. Ys)(Xe, Ye)<- XY coordinate, 
width & height 


Projection Search 


projecttonX 
projection Y 


Location coordinate 
(Xsm,Xsm) 
(Xem, Yem) 


Xmin, Xmax. Ymin, Ymax<~ 
XY coordinate, width & height 

N=0.1 


Area counting 


DiErBiBPimagc 


Am: Area of face 


(Xs, YsXXe, Yc):(Xsm, Ysm) 

(Xem. Yem) 

Constl:! 
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Eve area detection 



Step 


Input 


Output 


Parameter (unfixed) 


Transformation to 
srev scale 


ERGB image 


GERGB image 




Reduction of image size 


GERGB image 


RGERGB image 


Mx=My=2 

(Reduction rate = 1/2) 


MM -Template 
Matching 


RGERGB image 
Eye Template 1 
Eye Tempi ate2 
Eye Template3 
Eve Templaten 


Position: (Xel, Yel) 
Position: (Xe2, Ye2) 
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Fig. S/&) 



Multi- Template Matching 



Step 


Input 


Output 


Parameter (unfixed) 


Temulate Matching 


RGERGB image 
Eye Template 1 


TM1 image 


Template address 
N 


Template Matching 


RGERGB image 
Eye Template2 


TM2 image 


Template address 
N 


Template Matching 


RGERGB image 
Eye Tempi aie3 


TM3 image 


Template address 
N 


Template Matching 


RGERGB image 
Eye Template n 


TMn image 


Template address 
N 


Max pixel selection 


TM1 image 
TM2 image 
TM3 image 

TMn image 


TM image 


No. of image 
its address 


Convolution 
(Smoothing2) 


TM image 


CTM image 


Filter size: 9 


Max value search 


CTM image 


Position: (Xel, Yel) 


(Xs. Ys)(Xe, Ye):(0, 0)(127. 127) 


Set Constant 


CTM image 


CTMO image 


(Xs. Ys)(Xe. Ye)< -(Xel> Yel) 
Const 1 


Max value search 


CTMO image 


Position: (Xe2, Ye2) 


(Xs, Ys)(Xe % Ye):(0,0)(127, 127) 



Fig. 3/(bj 
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75 



Grey Scale 
image Fix 
Stream 




256bit SHIFT nEG 



256bit SHIFT REG 



H 2»Mt SHIFT HEG}p ?^^ ( 
U 256bit SHIFT aj^p gq^^ 
L{ 256b,t SHIFT ncf}^ ^^ 



256bit SHIFT =EG 



%y vrvr vrw 

T T T T T IT T 



T T T T ^ T T 



|8BitXCFJ 8B:tXORj 8BitXOR| iSBitXCfif 












1 


3bit Adcer 


i i 


9bitAccer 


1 



l8BitXOFj 



| iQbrtAdderl 



I Mbit Adder | 
^LJ ; 

| i25rt Adoer | 



CbitAceer | 



I T^itAddeTl 



1*brtAdaer I 
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Transformation to prcv scale 



INPUT image 



C 




FUNCTION 



Grcy(x. y)= (R(x. y>+ G(x. y>+ B(x, y)) / 3 



33(a) 
/ 



OUTPUT image 




PARAMETERS 



None 



%.5f 



AS 




PARAMETERS 

Template Address 
N 
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53fc I 



INPUT image 



Max pixel selection 

\ 



Rl(x. y). R2(x. y), R3(x, y) M „ Rn(x. y) 



c 




ROCx, y>= Max((Rl(x, y), R2(x, y), R3(x. y) Rn(x, y» 



) 



OttTPUT image 



RtKx.y) 




PARAMETERS 



No. of inpuL image and 
its address 



'3 
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33(c) Set Constant 
I NPUT image 




If (xs<=x<=xe) and (ys<=y<=yc) then p(x,y)=consU else p(x. y)=original value 



OUTPUT image 



p(x.y> 



Jf (xs, ys) 








(x,y) 






(xc, ye) 



PARAMETERS 



(xs,ys) (xe,ye) 
Const] 
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